AITopics

2605.2295

Country: Europe > Germany (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)

Hamza, Ishaq, Chen, Zaiwei

Achieving $ε^{-2}$ Sample Complexity for Single-Loop Actor-Critic under Minimal Assumptions

arXiv.org Machine LearningMay-14-2026

In this paper, we establish last-iterate convergence rates for off-policy actor--critic methods in reinforcement learning. In particular, under a single-loop, single-timescale implementation and a broad class of policy updates, including approximate policy iteration and natural policy gradient methods, we prove the first $\tilde{\mathcal{O}}(ε^{-2})$ sample complexity guarantee for finding an $ε$-optimal policy under minimal assumptions, namely, the existence of a policy that induces an irreducible Markov chain. This stands in stark contrast to the existing literature, where an $\tilde{\mathcal{O}}(ε^{-2})$ sample complexity is achieved only through nested-loop updates and/or under strong, algorithm-dependent assumptions on the policies, such as uniform mixing and uniform exploration. Technically, to address the challenges posed by the coupled update equations arising from the single-loop implementation, as well as the potentially unbounded iterates induced by off-policy learning, our analysis is based on a coupled Lyapunov drift framework. Specifically, we establish a geometric convergence rate for the actor and an $\tilde{\mathcal{O}}(1/T)$ convergence rate for the critic, and combine the two Lyapunov drift inequalities through a cross-domination property. We believe this analytical framework is of independent interest and may be applicable to other coupled iterative algorithms with unbounded

artificial intelligence, machine learning, reinforcement learning, (19 more...)

2605.13639

Country: North America > United States (0.28)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Domingo-Enrich, Carles, Du, Yuanqi, Albergo, Michael S.

A unified perspective on fine-tuning and sampling with diffusion and flow models

arXiv.org Machine LearningMay-4-2026

ABSTRACT We study the problem of training diffusion and flow generative models to sample from target distributions defined by an exponential tilting of a base density; a formulation that subsumes both sampling from unnormalized densities and reward fine-tuning of pre-trained models. This problem can be approached from a stochastic optimal control (SOC) perspective, using adjoint-based or score matching methods, or from a non-equilibrium thermodynamics perspective. We provide a unified framework encompassing these approaches and make three main contributions: (i) bias-variance decompositions revealing that Adjoint Matching/Sampling and Novel Score Matching have finite gradient variance, while Target and Conditional Score Matching do not; (ii) norm bounds on the lean adjoint ODE that theoretically support the effectiveness of adjoint-based methods; and (iii) adaptations of the CMCD and NETS loss functions, along with novel Crooks and Jarzynski identities, to the exponential tilting setting. We validate our analysis with reward fine-tuning experiments on Stable Diffusion 1.5 and 3. 1 INTRODUCTION Recent advances in generative modeling have demonstrated the effectiveness of diffusion and flow matching models for learning complex data distributions (Song et al., 2021; Ho et al., 2020; Lipman et al., 2022; Albergo et al., 2023; Liu et al., 2023). In many applications, however, it is desirable to tailor the generative process to favor certain qualities, either by sampling from an unnormalized target distribution or by fine-tuning a pre-trained model with a reward function (Uehara et al., 2024; Domingo-Enrich et al., 2025; Zhang & Chen, 2022; Holdijk et al., 2023).

artificial intelligence, machine learning, pdata, (18 more...)

2605.00229

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Machine LearningApr-28-2026

Inference of Online Newton Methods with Nesterov's Accelerated Sketching

Wang, Haoxuan, Du, Xinchen, Na, Sen

Reliable decision-making with streaming data requires principled uncertainty quantification of online methods. While first-order methods enable efficient iterate updates, their inference procedures still require updating proper (covariance) matrices, incurring $O(d^2)$ time and memory complexity, and are sensitive to ill-conditioning and noise heterogeneity of the problem. This costly inference task offers an opportunity for more robust second-order methods, which are, however, bottlenecked by solving Newton systems with $O(d^3)$ complexity. In this paper, we address this gap by studying an online Newton method with Hessian averaging, where the Newton direction at each step is approximately computed using a sketch-and-project solver with Nesterov's acceleration, matching $O(d^2)$ complexity of first-order methods. For the proposed method, we quantify its uncertainty arising from both random data and randomized computation. Under standard smoothness and moment conditions, we establish global almost-sure convergence, prove asymptotic normality of the last iterate with a limiting covariance characterized by a Lyapunov equation, and develop a fully online covariance estimator with non-asymptotic convergence guarantees. We also connect the resulting uncertainty quantification to that of exact and sketched Newton methods without Nesterov's acceleration. Extensive experiments on regression models demonstrate the superiority of the proposed method for online inference.

artificial intelligence, machine learning, optimization problem, (16 more...)

2604.23436

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)

Neural Information Processing SystemsApr-25-2026, 09:19:30 GMT

31784d9fc1fa0d25d04eae50ac9bf787-Supplemental.pdf

artificial intelligence, assumption, inequality, (15 more...)

Technology: Information Technology > Artificial Intelligence (1.00)

Neural Information Processing SystemsApr-24-2026, 18:11:44 GMT

Supplementary material for Discrete Valued Neural Communication in Structured Architectures Enhances Generalization

In this appendix, as a complementary to Theorems 1-2, we provide additional theorems, Theorems 3-4, which further illustrate the two advantages of the discretization process by considering an abstract model with the discretization bottleneck. For the advantage on the sensitivity, the error due to potential noise and perturbation without discretization -- the third term ξ(w,r0,M0,d) >0 in Theorem 4 -- is shown to be minimized to zero with discretization in Theorems 3. See Appendix C.1 for a simple comparison between the bound of Theorem 3 and that of Theorem 4 when the metric spaces (M,d) and (M0,d0) are chosen to be Euclidean spaces. We now introduce the notation used in Theorems 3-4. Here, ϕw represents a deep neural network with weight parameters w W RD, qe is the discretization process with the codebook e E RL m, and hθ represents a deep neural network with parameters θ Θ Rζ. Thus, the tuple of all learnable parameters are (w,e,θ).

artificial intelligence, deep learning, machine learning, (19 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Du, Yuanqi, He, Jiajun, Zhang, Dinghuai, Vanden-Eijnden, Eric, Domingo-Enrich, Carles

Rare Event Analysis via Stochastic Optimal Control

arXiv.org Machine LearningApr-16-2026

Rare events such as conformational changes in biomolecules, phase transitions, and chemical reactions are central to the behavior of many physical systems, yet they are extremely difficult to study computationally because unbiased simulations seldom produce them. Transition Path Theory (TPT) provides a rigorous statistical framework for analyzing such events: it characterizes the ensemble of reactive trajectories between two designated metastable states (reactant and product), and its central object--the committor function, which gives the probability that the system will next reach the product rather than the reactant--encodes all essential kinetic and thermodynamic information. We introduce a framework that casts committor estimation as a stochastic optimal control (SOC) problem. In this formulation the committor defines a feedback control--proportional to the gradient of its logarithm--that actively steers trajectories toward the reactive region, thereby enabling efficient sampling of reactive paths. To solve the resulting hitting-time control problem we develop two complementary objectives: a direct backpropagation loss and a principled off-policy Value Matching loss, for which we establish first-order optimality guarantees. We further address metastability, which can trap controlled trajectories in intermediate basins, by introducing an alternative sampling process that preserves the reactive current while lowering effective energy barriers. On benchmark systems, the framework yields markedly more accurate committor estimates, reaction rates, and equilibrium constants than existing methods.

artificial intelligence, equation, machine learning, (18 more...)

2604.13213

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Control Systems (0.90)

arXiv.org Machine LearningMar-24-2026

Time-adaptive functional Gaussian Process regression

Ruiz-Medina, MD, Madrid, AE, Torres-Signes, A, Angulo, JM

This paper proposes a new formulation of functional Gaussian Process regression in manifolds, based on an Empirical Bayes approach, in the spatiotemporal random field context. We apply the machinery of tight Gaussian measures in separable Hilbert spaces, exploiting the invariance property of covariance kernels under the group of isometries of the manifold. The identification of these measures with infinite-product Gaussian measures is then obtained via the eigenfunctions of the Laplace-Beltrami operator on the manifold. The involved time-varying angular spectra constitute the key tool for dimension reduction in the implementation of this regression approach, adopting a suitable truncation scheme depending on the functional sample size. The simulation study and synthetic data application undertaken illustrate the finite sample and asymptotic properties of the proposed functional regression predictor.

artificial intelligence, machine learning, modeling & simulation, (18 more...)

2603.21144

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Galicia > Madrid (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Neural Information Processing SystemsFeb-15-2026, 22:14:25 GMT

A Another universality result for neural oscillators

The universal approximation Theorem 3.1 immediately implies another universal approximation Thus y (t) solves the ODE (2.6), with initial condition y (0) = y (0) = 0 . Reconstruction of a continuous signal from its sine transform. Step 0: (Equicontinuity) We recall the following fact from topology. F (τ):= null f (τ), for τ 0, f ( τ), for τ 0. Since F is odd, the Fourier transform of F is given by We provide the details below. The next step in the proof of the fundamental Lemma 3.5 needs the following preliminary result in By (B.3), this implies that It follows from Lemma 3.4 that for any input By the sine transform reconstruction Lemma B.1, there exists It follows from Lemma 3.6, that there exists Indeed, Lemma 3.7 shows that time-delays of any given input signal can be approximated with any Step 1: By the Fundamental Lemma 3.5, there exist It follows from Lemma 3.6, that there exists an oscillator Step 3: Finally, by Lemma 3.8, there exists an oscillator network,

artificial intelligence, lemma 3, machine learning, (19 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Ayoub Belhadji, Rémi Bardenet, Pierre Chainais

Kernel quadrature with DPPs

Neural Information Processing SystemsFeb-12-2026, 13:46:16 GMT

Thislinkbetween the two kernels, along with DPP machinery, leads to relatively tight bounds on the quadrature error,that depends onthespectrum oftheRKHS kernel.

artificial intelligence, kernel, machine learning, (19 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Europe > France (0.05)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)